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Abstract 

In this paper we consider the computation of channel capacity for ergodic multiple-input multiple-output channels with additive 
white Gaussian noise. Two scenarios are considered. Firstly, a time-varying channel is considered in which both the transmitter 
and the receiver have knowledge of the channel realization. The optimal transmission strategy is water-filling over space and 
1/^ ' time. It is shown that this may be achieved in a causal, indeed instantaneous fashion. In the second scenario, only the receiver 

, has perfect knowledge of the channel realization, while the transmitter has knowledge of the channel gain probability law. In this 

■ case we determine an optimality condition on the input covariance for ergodic Gaussian vector channels with arbitrary channel 

' distribution under the condition that the channel gains are independent of the transmit signal. Using this optimality condition, we 

find an iterative algorithm for numerical computation of optimal input covariance matrices. Applications to correlated Rayleigh 
\ and Ricean channels are given. 
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I. Introduction 



Shannon theoretic results for multiple-input multiple-output (MIMO) fading channels [4, 5] have stimulated a large amount 
I— —I of research activity, both in the design of practical coding strategies and in extension of the theory itself. 

From an information theoretic point of view, the main problem is to find the maximum possible rate of reliable transmission 
'"1 . over i-input, r-output additive white Gaussian noise channels of the form 

, O; y[k] ^ ^H[k]x[k] + z[k] (1) 

^ ' where y[k] E is a complex column vector of matched filter outputs at symbol time k ^ 1,2, . . . , N and H[k] e C^* is 

the corresponding matrix of complex channel coefficients. The element at row i and column j of H[k] is the complex channel 
coefficient from transmit element j to receive element i. The vector x[k] G C*^^ is the vector of complex baseband input 
signals, and z[k] G C^^ is a complex, circularly symmetric Gaussian vector with E|ri[fc]n[fc]^| ~ I^- The superscript (•)^ 
I means Hermitian adjoint and is the r x r identity matrix. Let n — max{t, r) and m = mm{t, r). 
^-H ' Transmission occurs in codeword blocks of length N symbols. Let x^q E C* and yjy E be the column vectors resulting 
from stacking a;[2], . . . , x[N] resp. y[l], y[2], . . . , y[N]. Further let be the block-diagonal matrix with diagonal blocks 
O _ H[k]. 

C/3 . A transmitter power constraint 

O ■ 1 „ „2 



^ u^nW^ < 1 (2) 

is enforced, where N is the codeword block length. This power constraint has been explicitly written out this way to remind 
. the reader that power constraints such as this, commonly written i?[||a;[A;]||2] < 1 are long-term average power constraints, not 
' deterministic per-symbol, or per-input constraints, see [6, p. 329]. Accordingly, the signal-to-noise ratio is defined as 7. The 
covariance matrix of input sequences of length N is defined as the Nt x Nt matrix 

Qn = E{xnxn^} (3) 

and hence the power constraint can also be written as tr{QN) < N. Also define the per-symbol input covariance matrices 
Q[fc] =e{ a:[A:]a;[fc]^|, which appear as principal sub-matrices in Qn- In the case of memoryless transmission, Qn is a block 
diagonal matrix with diagonal blocks Q[k]. 

The power constraint assumes that the power received from the collection of transmit signals at any point in space (e.g.. 
at some imaginary point close to the transmitter) is given by the summation of the individual signal powers, ie. zero mutual 
coupling. 

There are several possibilities for the amount of side information that the receiver or transmitter may possess regarding the 
channel process H[k]. Perfect side information shall mean knowledge of the realizations H[k], while statistical side information 
refers to knowledge of the distribution from which the H[k] are selected. Perfect receiver side information will be assumed 
throughout the paper 
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There are several categories of channels Q that have been investigated in the literature: 

1) Channels in which H[k], is a given sequence of channel matrices, known to both the transmitter and receiver. 

2) Ergodic channels in which the H[k], k = 1,2,... are random matrices, selected independently of each other and 
independently of the x[k], according to some matrix probability density function pn, which is known at the transmitter. 
The specific channel realizations are unknown at the transmitter, but are known at the receiver 

Under the assumption of additive Gaussian noise and perfect receiver side information, the optimal input distribution is Gaussian, 
and the main problem is therefore the determination of the capacity achieving input covariance matrix Qn. 

For a given input covariance, the information rate for case ^ is (adopting a modification of the notation of [4]), 

i;{QN, Hn) - ^ logdet (^Ini + HnQnHn^^ • (4) 

The capacity is found by maximizing the information rate. 
Problem 1 (Gallager [7]): 

Qn 

subject to 

^tr(g^)< 1 

Qn>0 

Note that since ip is a function of Hn, the optimal covariance matrix will in general be a function of Hn. 

Telatar [4] obtained the solution of Problem ^ when H[k] = H for all fc = 1, 2, . . . . Following Gallager [7], the solution 
is obtained by solution of the Kuhn-Tucker conditions, and results in a water-filling interpretation, 

C = log/iAi, where ^ is such that (5) 

7= E ^-V' (6) 



and Ai, i = 1, 2, . . . , TO are the non-zero eigenvalues of HH^. The optimal transmit covariance matrix is independent of k and 
is given by Q[k] = Q = V'TV, where V is the matrix of right singular vectors of H and F = diag{max(0, /i — l/A^)}. 

The information rate in the ergodic case is ^I* = £{-0} and subject to the assumptions in case |2] above, reduces to a 
symbol- wise expectation with respect to pn, 

^iQ,PH) ^ E{\ogdet {I + HQH^)} (7) 

where Q — Q[k] is t x t covariance matrix for each symbol. In this case, capacity is found via solution of 
Problem 2 (Telatar [4]): 

max^((3,p//) 
Q 

subject to 

tr(g) < 1 
Q > 

Since is an expectation with respect to pn, the optimal Q will depend on pn, rather than the realizations H[k]. 

One common choice for pn is a Gaussian density. We will use the notation Aft.r {M, S) to mean a Gaussian density with rxt 
mean matrix M and rt x rt covariance matrix S = E { hh^ } where h is formed by stacking the columns of the matrix into a 
single vector. This allows for arbitrary correlation between elements. Common special cases include i.i.d. unit variance entries, 
A/f.r (0,/) (corresponding to independent Rayleigh fading) and the so-called Kronecker correlation model Aft.r {M, R(E)T). 
The latter model corresponds to separable transmit T and receive correlation R, and may be generated via 
where G ^ Nt,r (0,/). For H[k] ^ Mt,r (0,/) Telatar showed that the optimizing Q — It/t, meaning that it is optimal to 
transmit independently with equal power from each antenna. Thus in that case 



C 



Ejlogdet (^/,. + ^HF^jj. (8) 



Telatar also gave an expression for computation of (|8}, and several other expressions have subsequently been found [8-10]. 

Finally, Telatar considered a variation on case [2 with time-invariant H[k] — H and perfect receiver side information, but 
only statistical transmitter side information. This requires the notion of outage probability. It was conjectured that the optimal 
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transmission strategy, minimizing the outage probability, is equal power signals from a subset of antennas. We do not consider 
outage probability in this paper 

It is clear from these results that the degree of channel knowledge at the transmitter has a significant effect on the optimal 
transmission strategy. 

Extensions to the theory have taken several directions, for example extending the ergodic capacity results to channel matrices 
whose elements are no longer independent of each other. "One-ring" scatterer models, resulting in single-ended correlation 
structure H ^ Mt,r (0, /(8)T) were considered in [11]. Bounds on capacity were obtained in that work, assuming Q = I/t. 
Subsequently, a series of papers appeared, adopting the same single-ended correlation model. In [12] it was shown that for 
H ~ Aft.r (0, / (Xi T) it is optimal to transmit independently on the eigenvectors of T. Majorization results were obtained 
showing that stronger modes should be allocated stronger powers, and optimal Q were found using numerical optimizations. 
No conditions for optimality were given. In [13], a closed-form solution for the characteristic function of the mutual information 
assuming Q = I/t was found for the same single-ended correlation model. In [14], the special case of < = 2 was considered, 
where optimization of Q could be performed, once again assuming no receiver correlation, R = I. 

Asymptotic large systems (r,t oo with r/t a constant) capacity results have been obtained in [15], for the more 
general case H ~ Aft.r (0, R <E) T), but under the assumption Q — I/t. Asymptotic results for arbitrary Q were considered in 
[16], where the asymptotic distribution of the mutual information was found to be normal. Large-systems results have been 
obtained in [17], concentrating on the case where the eigenvectors of the optimal Q can be identified by inspection. 



Closed form solutions have been obtained for the mutual information of single-ended correlated channels [10, 18] and for 
H ^ Aft.r (0,i?® T), [19,20]. 



Non-zero mean multiple-input, single-output channels were considered in [21,22]. In those papers, results were obtained for 
non-zero mean, in the absence of transmitter correlation, and for non-trivial transmitter correlation, with zero mean. Further 
results for non-zero mean channels have been presented in [23], which reports some majorization results on mutual information, 
with respect to the eigenvalues of the mean matrix. Exact distributions of mutual information have been obtained in for t = 2 
or r = 2. Asymptotic expressions for the mutual information have been presented in [24], for arbitrary Q, and non-central, 
uncorrected fading. 

Other researchers [25-28] have examined variations on the amount of information available at transmitter and receiver 
Previous work such as [4,7,12,14,17,22] on Gaussian vector channels focused on cases when the eigenvectors of the 
optimal input covariance can be easily determined by inspection of the channel statistics, and the problem becomes one of 
optimizing the eigenvalues of the input covariance. This approach does not lend itself to arbitrary non-deterministic channels: 
for example where the channel mean and covariance are not jointly diagonalizable or where the probability density is not in 
Kronecker form [29,30]. 

This paper provides general solutions of Problems [l] and |2] The latter provides a solution to [31, open problem 1 and 2], 
albeit not in closed form. 

In Section |ll| we extend the water-filling result to ergodic channels where the transmitter has perfect knowledge of the 
channel realization II[k] at each symbol. In Section |lll| we relax the degree of transmitter channel knowledge and consider the 
ergodic channel with arbitrary channel distribution pn, such that pn, but not II[k] is known to the transmitter 

The semidefinite constraint Q > in Problem |2] would normally make the optimization difficult. However, in several cases, 
the eigenvectors of the optimal Q may be identified a-priori, which reduces the problem to an optimization over the space of 
probability vectors. In independent work, [17] has found similar results to those presented in this paper for this "diagonalizable" 
case. We avoid the requirement of diagonalizing Q. Our main result is the determination of the capacity achieving covariance 
for arbitrary ergodic channels. This is achieved by finding necessary and sufficient conditions for optimality, which in turn 
yield an iterative procedure for numerical optimization of Q, which finds the optimal eigenvectors in addition to the optimal 
eigenvalues. In each section we provide numerical examples that illustrate the application of the main results. Conclusions are 
drawn in Section Hvl All proofs are to be found in the Appendix. 



As described above, Telatar [4] solved Problem for time-invariant deterministic channels. There are cases of interest 
however when the transmitter and receiver have perfect side information, but the channel is time-varying. One model for this 
case is to suppose that II[k] is indeed time-varying, and that this sequence is a realization of a random process, in which each 
II[k] is selected independently at each symbol k (and independently of the x[k]) according to some probability law pH, so 
the channel remains memoryless. 

Subject to this model, we seek a solution to Problem ^ in which the sequence of channel matrices are generated i.i.d. 
according to pn. It is tempting to simply average (|5j over the ordered eigenvalue density, pa{Xi, . . . , Am), associated with pH 
(see for example [32]), 



II. Perfect Transmitter Side Information 




(9) 
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This quantity is however in general not the capacity of the channel Q with H[k] ^ pn- A simple counter-example suffices to 
show the problem. 

Example 1: Consider a single-input single-output channel, r = t = 1, and let p_ff (e) = Ph{^) = 1/2 where e > 0. Then 
according to (|9} in which water-filling precedes averaging, the resulting information rate is log (l + je^) /4 + log(l + 7)/4 
which as e approaches log(l + 7)/4. 

It is obvious however that as e ^ 0, the transmitter should only transmit in symbol intervals in which H ^ 1, resulting in 
the capacity log(l + 7)/2 which is a factor of two greater than the previous approach. 

The problem with (|9} is that it precludes optimization of the transmit density over time as well as space. The rate (|9j is maximal 
only under the assumption of a short-term power constraint tr{Q[k]) — 1, rather than the long-term constraint tr((5jv) = N. 
The following Theorem, is proved by solving the input distribution optimization problem from first principles (see Appendix). 



Theorem 1: Suppose that the channel matrices H[k] of an ergodic MIMO channel Q are selected i.i.d. each symbol k 
according to a matrix density pu which possesses an eigenvalue density f\. The capacity of this channel with prefect channel 
knowledge at both the transmitter and the receiver is given by 

C_ 

m 

1 

TO 



/•oo 

/ log (^A) /a(A) d\ where ^ is such that (10) 



A 



fx{X)d\. (11) 

It is interesting to note that not only does this Theorem yield the actual capacity, as opposed to the rate given by (|9}, it is also 
easier to compute in most cases, since it is based on the distribution of an unordered eigenvalue. 

Water-filling over space and time has been addressed to a limited extent in the literature. Tse and Viswanath give the result, 
without proof [33, Section 8.2.34]. Goldsmith also writes down the optimization problem (without solution) in [34, Equation 
(10.16)], and also in [26]. The correct space-time water filling approach is also implicit in [35], although no proof or discussion 
is offered. 

Let us now examine the optimal transmit strategy in more detail. Let H\k] — U\k]A\k]V[k] be the singular value de- 
composition of H[k] and let _ffjv, L/jv, Vn and A^r be the corresponding block diagonal matrices. Then the singular value 
decomposition of the block diagonal matrix Hm is 

Hn = Un^nVn. (12) 

This follows directly from the block-diagonal structure of Hm- The fact that the singular vectors are also in block-diagonal 
form is important from an implementation point of view. If it had turned out that Hm had full singular vector matrices, the 
optimal transmission strategy would be non-causal. 

The optimal transmit strategy uses a block-diagonal input covariance matrix, 

Qn = dmg{V^[l]T[l]V[l],...,V^[N]T[N]V[N]] (13) 

where r[fc] = (^I — (A[fc]) , using the notation (•)+ which replaces any negative elements with zero. The block-diagonal 
structure means that the input symbols are correlated only over space, and not over time. At time k, the input covariance is 
Q[k] — V'^[k]T[k]V[k]. Thus the optimal transmit strategy is not only causal, but is instantaneous, i.e. memoryless over time. 
At time fc, the transmitter does not need to know any past or future values of H[f\, j > i and j < i in order to construct the 
optimal covariance matrix. 

The key thing to note from Theorem ^ is that the required power allocation is still water-filling on the eigenvalues of 
H[k]H''[k], but that the water level ^ is chosen to satisfy the actual average power constraint, rather than a symbol- wise 
power constraint. At any particular symbol time, the transmitter uses a power allocation — 1/A)^ for each eigenvalue A of 
H[k]H'^[k], noting that ^ is selected according to il l\ rather than on a per-symbol basis, (|6}. This does not require any more 
computation that symbol-wise water filling. In fact, it is simpler, since the transmitter only needs to compute the water level 
f once. Not only does space-time water filling give a higher rate, it is in this sense easier to implement. 

One possible argument against the use of space-time water-filling is that with this approach, there is a variable amount of 
energy transmitted at each symbol interval. In some cases that would certainly be undesirable (such as systems using constant 
envelope modulation). 

Theorem 2: The peak-to-average power ratio resulting from space-time water-filling, ( II 0> . ( II 1> on an ergodic channel with 
average power constraint 7 and unordered eigenvalue density /(A) such that £'[1/A] exists is upper-bounded 

PAPR< l + -S[A-i] . 

7 

This is a particularly simple characterization of the PAPR. The term to£'[1/A]/7 is the ratio of the average inverse eigenvalue 
to the average symbol energy per eigen-mode. 
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It is also straightforward to compute the information rate / that results from adjusting the space-time water-filling solution 
to accommodate a peak-power limitation 7,naxj 

r-(?-7m.x)~' 



— = / log(^A) f{X)dX where ^ is such that 

TO J^-l 



^ /•(?-7„,ax)-^ / 1 



TO 



e--) fix)dx. 



Note that this is not the same as the capacity of the peak-power constrained channel. In practice however, it may be of interest, 
since powers approaching ^ are typically transmitted with vanishing probability. It is therefore of interest to consider the 
probability density function q{j) of the per-eigenvector transmit power, 7 = ^ — 1/A. The obvious transformation yields the 
density function. 

Theorem 3: The probability density function (7(7) of the energy 7 = ^ — 1/A transmitted on each eigenvector according 
to (tTol . (tm is given by 

qi^)^F{r')Sh) + ^ 

where /(•) is the unordered eigenvalue density, F{-) is the corresponding cumulative distribution and S is the Dirac delta 
function. The point mass at 7 = corresponds to the probability of transmitting nothing on that channel (when the gain is 
less than 1/0. 

The following examples show some simple applications of the preceding space-time water-filling result. 

Example 2 (Parallel On-Off Channel): Consider an TO-input, TO-output channel with eigenvalue density {l—p)6{X) +pS{X — 
1). There are to parallel channels and each channel is an independent Bernoulli random variable. With probability p, a channel 
is "on" and with probability 1 — p it is "off". 

Spatial water-filling yields the rate 

where k ~ Binomial(m,p). It is straightforward to show however that the capacity is 



2 ° V E 

mp P 

= ^ log 1 + 

z \ mp ^ 

which, as expected is strictly larger than the former rate, a fact that can be seen from Jensen's inequality. 

Example 3 (Rayleigh, t = r = 1): Consider the single-input, single-output Rayleigh fading channel. Then /(A) = e^"^ and 
^ is the solution to 

ee-i/« + r(o,r^) = P: 

where r{a,x) is the incomplete Gamma function [36, (8.350.2)]. Figure ^ compares the resulting capacity to the rate obtained 
via per-symbol water-filling. Note that in this case, the latter corresponds to the capacity when the transmitter does not know 
the channel realization. In other words, application of the incorrect method results in ignoring the channel knowledge at the 
receiver. 
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Fig. 2. Rayleigh channel, t = r = 2. 

Example 4 (Rayleigh t ~ r ^ 2): Consider the two-input, two-output Rayleigh fading channel. Then /(A) — 2^^^'' ^ ^^'^ 
^ is the solution to 

e-i/«(2e + i)-2r (o,r^) ^P- 

Figure |2] compares the resulting capacity to the rate obtained via per-symbol water-filling and to the rate obtained with 
Q ~ Pit. The curves for space-time water-filling and spatial water-filling almost coincide on this figure. This is however 
hiding the additional gain provided by space-time water-filling at low SNR. Figure |3l shows the relative gains, compared 
to Q = Pit for space-only and space-time water- filling. Obviously, as SNR oo, both gains approach 1, since there is 
asymptotically no benefit in water filling of any kind. At SNR below dB, space-time water-filling yields significant benefit 
compared to water-filling only over space. 




-20 -10 10 20 

SNR, dB 

Fig. 3. Rayleigh channel, t = r = 2. 

Example 5 (Rayleigh t = r = A): Figure |3 shows the relative capacity gain over Q = Pl/t for a four-input, four-output 
system. Obviously the additional gain over spatial water-filling is decreased compared to the t = r = 2 case. In fact as 

r — > oo, there is asymptotically no extra gain to be found by additionally water-filling over time as well as space. As 
the dimension increases, the eigenvalue density converges to the well-known limit law, holding on a per-symbol basis. Thus 
space-time water filling on Rayleigh channels is of most importance for small systems. 

Figure |5] shows the peak-to-average power ratio in decibels for < = r = 1, 2, 4. Note that this is the exact value of the R4PR. 
For Rayleigh channels with finite to, the bound of Theorem |2] does not apply, since i?[l/A] does not exist. From this figure, 
the peak-to-average power is relatively insensitive to the system dimensions for the Rayleigh channel. The particular values of 
R\PR are comparable with what may be experienced in an orthogonal frequency division multiplexing system. 

As described earlier, the peak-to-average power ratio may be misleading, since it is conceivable that the peak power may 
only be transmitted infrequently. Figure |6l shows the probability density function of the power transmitted per-eigenvector for 
t r = 2. At low SNR, the density is broad and has significant mass above the target average power P/m. As the SNR 
increases, the density converges to an impulse at P/m. 

III. Statistical Transmitter Side Information 

It is tempting to think that Q = I/t is optimal when the transmitter has no knowledge about the channel, and assertions to 
this effect have appeared in the literature. In the complete absence of transmitter side information however (i.e. the transmitter 
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Fig. 5. Peak-to-Average Power Ratio t - 
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does not even know "Ph), the underlying information theoretic problem is difficult to define. There are several possibilities, for 
instance p// may selected somehow randomly from a set of possible channel densities. Alternatively, could be fixed, but 
unknown, in the spirit of classical parameter estimation. In the absence of a thorough problem formulation and corresponding 
analysis, it is clear that optimality of Q = J/i is at best conjecture. For example, in the case where is drawn randomly 
from a set of possible densities, it may be an outage probability that is of interest. This problem is not completely solved even 
when the p/f are degenerate (i.e. the non-ergodic channel of Telatar), and in that case transmission on a subset of antennas is 
believed to be optimal. We do not consider these more difficult problems, and restrict attention to transmitter knowledge of 
Vh- 

The result (jS) arises from [4, Theorem 1] and holds for independent, identically distributed, circularly symmetric Gaussian 
channel matrix R, independent of transmit symbols. In general, Q — It/t is not optimal, and thus provides only a lower bound 
to capacity. Several authors [37] have investigated the scenario of transmitting, equal power, independent Gaussian signals 
for various correlated central and non-central random matrix channels. Other work [38] have examined w0r5f-cfl.se mutual 

2.5 
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Fig. 6. Per-eigenvector transmit power density, t = r = 2. 
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information in the absence of transmitter side information, while [39] has appUed game-theoretic analysis to the problem of 
equal power transmission, observing that (in the absence of any better option) uniform power allocation is not "so bad." 

In the previous section, we considered the optimal transmit covariance for perfect transmitter side information. We shall 
now relax this constraint, so the transmitter has statistical side information only, which is a well-posed information theoretic 
problem. 

There are two main areas of interest. Firstly, in some scenarios, the eigenvectors of the optimal input covariance Q can be 
determined a-priori (typically by inspection). Several authors have described optimization of input covariance, by diagonalization 
of the transmit covariance [12,18,40]. In other work, [14] has outlined optimality conditions for beamforming vs MIMO 
diversity. Recent work [41] has also investigated the case where input and channel covariance matrices are jointly diagonalizable. 

The more general case, is when the eigenvectors of the optimal input covariance structure are not apparent a-priori, and may 
in fact be complicated functions of p^- This is the main area of interest in this paper, and Theorem|S](and the resulting iterative 
optimization procedure) is our the main result. We will begin in Section ITlI-Al bv finding the optimal Q in the diagonalizable 
case, which results in an interesting comparison to water- filling. Section UlI-BI extends the result to arbitrary pu- 

A. Diagonalizable Covariance 

Solution of Problem 12] is in general a semidefinite program, since the maximization is over the cone of positive semidefinite 
hermitian matrices Q > 0. In certain cases however, the problem simplifies, and we can obtain convenient conditions for 
optimality from the Kuhn-Tucker conditions. The simplest case, case S ~ Nr.t {0,1 ^ I) was solved in [4]. Other special 
cases have been solved in [12,40]. Independent work finding similar results to those described below has appeared in [17]. 

Suppose it can be determined that the optimal Q has the form 

Q = UQU^ (14) 

Q = diag(gi,g2, • ■ ■ ,gt) (15) 

for some fixed U. For such channels, the optimization problem reduces to finding the best allocation of power to each column 
of U. 

One important example is H[k] ^ Afm,m (0,i?(g)T), i.e. the Kronecker correlated Rayleigh channel with no line-of-sight 
components. In that case, is is known that U diagonalizes T and optimal transmission is independent on each eigenvector of 
T. 

In such cases, the condition Q > =4> Q > allows the application of the Kuhn-Tucker conditions for maximization of 
a convex function over the space of probability vectors [7, p. 87] to yield the following lemma. 

Lemma 1: Consider the channel Q with H[k] ^ Afm,m (0, R (8) T). The optimal covariance Q has the form (I14> and satisfies 
the Kuhn-Tucker conditions [7, p. 87] 



dqi 

a*(Q) 



= M q^>o (16) 

< M q^=0 (17) 



dqi 

where is a constant independent of qi, and qi are given by (O- 

Thus the necessary and sufficient conditions for optimality have a particularly simple form. Differentiating VE'(Q) — E^jlogdet (/ + HQW 
leads to the following theorem, proved in [10]. 

Theorem 4 (Optimal Covariance): Consider the ergodic channel ([0 with pn such that the optimal input covariance is known 
to be of the form (I14t -( I15> for some fixed unitary matrix U . A necessary and sufficient condition for the optimality of the 
diagonal Q in JTsi is 

Es|(^(/ + 5g)"'5^ |=/i gfc>0 (18) 



Esm/ + 5gj 5j |<M gfc = (19) 

for k = 1, 2, . . . , i and some constant /i. The expectation is with respect to the random matrix S* = 7 WWHU. The notation 
{A)^j denotes element ij of A. 
In the case Q > 0, the condition ( I18> may be re-written as a fixed-point equation 



(20) 



which suggests the following iterative procedure for numerically finding the optimal Q. Starting from an initial diagonal 
> 0, compute 



(i+l) (i) 



(21) 

kk 
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selecting z/^*) at each step to keep tr ^Q^*'' j = 7- Although there is no known closed form solution for Eg^ (^Q ^ + S* j-, 

it may be accurately estimated using monte-carlo integration. Note that the numerical procedure may be applied to each entry 
qk = Qkk separately for a given Q^''\ Numerically, each fixed point iteration is performed once and the t non-zero diagonal 
entries of Q are updated. 

It is interesting to compare the conditions ( I18> . ( I19l l with the solution of Problem [fl for perfect transmitter side information. 
Suppose H[i] — H is known at the transmitter with HH'^ = USW being the eigenvalues decomposition of HW. The 
Kuhn- Tucker condition for optimality of the input covariance Q — U'^QU'^ can be written in the following form, 

+ SQ) ^ s] =n qk>0 (22) 

^ / kk 

+ Sq) ^s) <fi qk = 0. (23) 

^ / kk 

with Q satisfying il5\ . Solution of these equations is straightforward and leads easily to ^ and (|6}. 

Comparing ( I18t with i22\ it can be seen that the only difference is the presence of the expectation in ( I18> . Similarly for 
il9\ and i23\ . This is no real surprise, and is due to the interchangability of differentiation and expectation. The result of 
Theorem 13 is a direct generalization of the classical water-filling result for parallel channels [42], where the transmitter has 
statistical side information, and the channel can be diagonalized a-priori. In the latter case however, there is no water-filling 
interpretation [43]. 

For the deterministic case, it is clear that increasing 7 can only increase the power allocated to any particular eigenvector 
(water-level raises). The same thing happens in the ergodic case, as demonstrated by the following theorem, proved in the 
Appendix. 

Theorem 5: Let Q = diag((7i, . . . , qt) be the eigenvalues of the optimal covariance matrix for a channel with signal-to-noise 
ratio 7, satisfying the conditions of Theorem |4] Then 

^>0, fc = l,2,...,i. 

07 

Thus a signal-to-noise ratio increase (decrease) can only increase (decrease) the power allocated to each eigenvector of the 
optimal covariance matrix. 

Theorem 0] takes care of zero-mean Rayleigh fading channels with separable correlation structure. In the case of Ricean 
fading with non-zero mean, one approach is to use the following approximation by a central distribution. 

Lemma 2 (Wishart Approximation [44]): Suppose H ^ Mr,t {M,I (E)T). Then S = HQH^ may be approximated by a 
central Wishart matrix [44, p. 125] 

S^Wt{0,^) (24) 

E = T^'^QT^'^ + ^MtM (25) 

This approximation motivates application of Theorem0]to the Ricean case with H[k\ ^ Afr,t {M, I (E)T) . The relation between 
correlation and line-of-sight (non-zero mean) has been heuristically established in MIMO channel measurement literature [45- 
47]. The accuracy of this approximation is investigated numerically below. 

In figure0we have plotted the capacity and the mutual information for a channel with rank-one mean M = diag{i, 0, . . . , 0} 
and non-diagonal transmit covariance 



T = 



1 T T 

r 1 T 



= t1 + diag{T - 1} (26) 



where 1 is a matrix of all ones. 

The plot compares the capacity (optimal input covariance, with true probability law) with the mutual information (input 
covariance given by central Wishart approximation) for various SNR and numbers of transmit and receive elements. Each plot 
has assumed t = r. We note that the approximated covariance matrix is a linear combination of the transmit-end covariance 
T and the mean, and thus approximated input covariance is a dominated by beamforming on M at low SNR, and T at higher 
SNR. 

Beamforming, i.e. rank-one transmission with Q = diag(l, 0, 0, . . . , 0) is a particularly simple strategy, which is optimal at 
low SNR (see Section Illl-D> . It is interesting to consider the conditions under which beamforming is optimal. 

Theorem 6: Consider an ergodic channel Q with H ^ Aft.r (0, R T), where without loss of generality R = diag(pi, . . . , pr) 
and T — diag(ri, . . . ,Tt) with tr(T) = t and tr(i?) = r. Beamforming is optimal if and only if 

^^+15 }>r- foranyfc>2, (27) 

1 + JTlW RU J Tl 
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Fig. 7. Mutual Information / witli central Wishai! approximation, for non-central channel. Solid lines give I{Q) (capacity) for optimal input covariance, 
while dashed lines give I{Qa) where Qa is optimal according to a central Wishart approximation. Closest results are given at high- and low-SNR and small 
numbers of elements. 



where the expectation is with respect to a length r Gaussian vector with i.i.d. unit variance entries, u Afr.i (0, /). 
The left hand side of ( I27> is monotonically decreasing with signal-to-noise ratio. 

For zero-mean Rayleigh channels, the condition i27\ can be found in closed form [48]. In the appendix we give an alternate 
proof to that given by [48]. Our proof is simplified via use of Theorem |6l 

Theorem 7 (Simon and Moustakas [48]): Consider an ergodic channel ([0 with H ^ Nt.r {0,R(^T), where without loss 
of generality R = diag(/9i, . . . , pr) and T — diag(Ti, . . . , r^) with tr(T) = t and tr(i?) = r. Beamforming is optimal if and 
only if 



E 



where 



Uk^jiP] - Pk) 
fimpi) - fiiTipj) 



> nr2 (28) 



C, 



Pi - Pj 

1A„/(™)A 

.Pl V "/Tip, J 

/(a;)=ei/-r(0,l/x). 

In the above theorem, note that (a is just the limit of Qj as pi pj. Theorem |6l is a generalization of [49] (which was for 
the MISO case), and the MISO result is recovered easily from i27\ via r — 1. 

Figure |8] shows the beamforming optimality condition of Theorem for a set of SNR levels 7 and a 2 x 2 channel, with 
H ~ Af2,2 (0, Ri^T) where R = diag{/3, 2 — p} and T = diagjr, 2 — r}, 1 < /?, r < 2. The plot is symmetric around the 
point p = T = 1 (and thus, only the top-left quadrant of the full < r, p < 2 plot is shown). 

The lines provide the transition point from regions where beamforming is optimal (above each line) to regions where 
beamforming is not optimal. The plot shows the region for 1 < r, p < 2. For t = 1, T — I and for r 2, T becomes 
singular, similarly for R: so that the top right-hand corner of the plot has highly correlated H, whilst the bottom left-hand 
corner has iid H. 

It can be seen that for low SNR, 7 = — 15dB, beamforming is almost always optimal with the transition occurring for 
T « 1.03. Note also, that the eigenvalues of R have little effect on the optimality of beamforming at low SNR. As SNR 
increases, the region for admissible covariance matrices for optimal beamforming reduces: we require more covariance matrices 
with larger eigenvalue separation. The optimality of beamforming is clearly dependent upon the eigenvalues of T. At higher 
SNR, the optimality of beamforming is also dependent on R (as can be seen by the 7 > OdB curves. The reason for this is 
that the low rank of R results in an effective power loss at the receiver 
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P 

Fig. 8. Optimality of beamforming. Beamforming is optimal for a given SNR for all points (r, p) above the line corresponding to that SNR value. The plot 
is symmetric for < p < 1 and < t < 1 

B. The General Case 

We now wish to solve Problem |2j without the a-priori requirement of diagonal input covariance. In this case, we need 
to maximize '^{Q,p{S)) over all positive definite Q. In particular we do not wish to restrict ourselves to particular matrix 
densities such as the zero-mean Kronecker Gaussian model. 

Whilst of interest in its own right, this problem arises when the input covariance structure cannot be solved by inspection. 
Specific examples include the non-central Gaussian random matrix channel, where the channel covariance and mean are not 
jointly diagonalizable, and for several random matrix channels which do not have simple (Kronecker) factorizations [29,50]. 

To accommodate the positive definite constraint on Q, we apply the Cholesky factorization, so the constraint becomes 
implicit in the final solution. By adopting this approach we force the optimization to only consider the minimum number of 
independent variables required for solution, <(t + l)/2 rather than t^. 

Any non-negative matrix A may be written as [51] 

A = r^r (29) 

for upper triangular matrix F, with the diagonal elements da real and non-negative. Similarly, for a given upper triangular matrix 
r, the product V^T is positive definite. The following useful properties [44] arise from ( I29> . ix{A) — tciV'^V) ~ J2i<j '^ij ^i^^ 
det{A)^U^dl 

Using ( I29> . transform Problem |2] to 

Problem 3 (Equivalent to Problem^: 

max*(r'lT,pH) 

subject to 

du > 0, Vi 

The maximum "if" for optimal d°, is not improved by choosing a trace less than unity, hence equality of the first constraint. 

Problem |3] admits a quadratic optimization approach, using Lagrange multipliers [52]. The optimization in Problem |3] occurs 
on the (upper triangular) matrix T which has exactly t{t + l)/2 independent (complex) variables. This corresponds to the 
number of independent variables for the optimization over Q in Problem [J since Q = U QW has t independent variables in 
the diagonal matrix Q and t{t — l)/2 independent variables in the unitary matrix U. 

In order to solve Problem|3j we produce a modified cost function J(j/, /i, </)) where i/ = T, p and (p are vectors of Lagrange 
multipliers corresponding to equality and inequality constraints. For this we use the following: 
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Lemma 3 (Application Kuhn-Tucker Theorem [53]): Given a convex n function /(ly) of a vector v, where is constrained 
by: 

i<j 

then 

-^^ = 2fiiy,j, z^j,M>0 (30) 

= 2fih'ii, fii>0,/i>0 (31) 
< i/,^ = (32) 

defines a maximum point for the function /{v). 

Lemma |3] provides the necessary conditions for a vector i/ = vec{T) to give a capacity achieving input covariance. We now 
present the main result of the paper: a general condition for the capacity achieving input covariance. 

Theorem 8 (Optimal Transmit Covariance): Given a MIMO channel Q with the channel chosen ergodic according to a 
probability distribution pn, then the capacity achieving input is Gaussian with covariance Q = F^F where F is upper triangular, 
and the element dij satisfies: 

E Jtr [(/ + sr^rr'SE^^^A I ^ 1'^'^- * ^ J- M > 

I J J \2fidu du >0,li>0 

Esjtr {I + Sr''r)-^SE^''^ }<0 du--0 (34) 

where the expectation is with respect to 5 = H^H, the constant /i is chosen to satisfy the power constraint and 

aFtF 



ddij 



V / m7i 



with 6ij — 1 when i = j and zero otherwise. 

The capacity of the channel is then given by application of F in ^(FtF,p(S')): 

C = E{logdet (/, + 5FtF)} 

Given the result of Theorem |8] we wish to numerically evaluate the optimal covariance, and hence capacity for an arbitrary 
multiple-input, multiple-output channel. Fortunately, the form of ( I33> also lends itself to a fixed-point algorithm. 
If we define the matrix 

Af = E{(/ + S'FtF)-iS'} (35) 

then 

tr(M£;('^)) = J2imkj + m,k)d,k = [F(M + M^)]^^ (36) 

k 

The matrix M may be interpreted as a differential operator, on the function 5'(F^^F,p(S')), evaluated at a particular value of 
T. This provides a direct fixed-point equation of projected gradient type [54]: 

r.C^+i) = -ij/W • VEsj* } (37) 

Writing this out completely gives the following algorithm 
Algorithm 1 (Iterative Power Allocation): 

1) Update using ( I35t 

p(fe+i) p(fe) (-^j ^jt^ (38) 



2) Scale 



'irr(fc+i)i.. i<j 

M L J — ■' (3^) 

otherwise 



with IX constant for all i, j and chosen so that tr (F^F) — 1. 
3) Repeat 

We denote F^*^) as the triangular matrix at iteration k. This algorithm may be initiated with any (upper triangular) F satisfying 
tr(FtF) = 1. The expectation (I38> is typically intractable and may be evaluated using monte-carlo integration. 
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Number of iterations 

Fig. 9. Convergence of Algorithm Qwitli 4x4 matrix. S = USoU'' |40). C = 1.1394 nat/s 



Theorem 9: Algorithm ^ converges to the optimal covariance Q° = T^T. 

We note that the stability of the algorithm is directly affected by the stability of the expectation in i35\ . In particular, at 
high-SNR, the off-diagonal entries of F will approach zero (since Q — al is optimal). In this case, the elements of F may 
fluctuate as small movements over the Haar manifold (small changes in eigenvectors) result in large changes in the entries of 
F. 

In Figure |5] we show an example of the convergence of the algorithm for several deterministic channel matrices. Each curve 
shows the difference between the mutual information for Q — F^F vs the channel capacity C for the fc*'* iteration. 

The example channel matrices were chosen to have common eigenvalues, but randomly chosen eigenvectors (thus each 
instance has the same capacity, but different optimal input covariance), with 

S^USoU\So = il'l) (40) 

In Figure [lO| we have shown the convergence of Algorithm ^ for different matrix dimensions, correlations for T and SNR 
values. In each plot the channel is a non-zero mean, correlated Gaussian, H ^ Mn^n {Mo,I ®T). Where — /i/i^ for a 
random vector ^ e C^^". The plots have been averaged over different values of Mq. Each convergence is run independently 
with a random seed value of F. Algorithm ^ converges to the capacity of the channel, although the convergence rate decreases 
for larger dimensions. As the channel dimension (and/or SNR) increases, the algorithm becomes more reliant on accurate 
Monte-Carlo integration, and thus individual iterations take an increasingly long time. 

C. Gaussian channel, non-commuting mean and covariance 
Consider a channel where 

H = kMo + (1 - ti)X (41) 

^ Mn.m (0,/® S) ,0 < K < 1 (42) 

using the notation of [44]. Further, we shall assume that the matrices Mo and S may not be jointly diagonalized (which is 
equivalent to the Hermitian matrices Mo and S being non-commuting [55, pp. 229]). We ask: How does the optimal covariance 
relate to Mo and T, as k, varies between and 1 ? 

For the purpose of providing graphical results we shall limit ourselves to a 2 x 2 case. While the numerical solution of this 
problem is straight-forward with Algorithm ^ describing the outcome poses several problems: it is insufficient to investigate 
only the entries of Q, since the subspace over which the optimal Q acts will change as k, varies. 

We note that the optimal covariance has eigenvectors which are not trivially related to the eigenvectors of the mean Mo 
or variance S. Further, the eigenvectors are not given by a direct interpolation between Mo and S, as can be seen by the 
superimposed the eigenvectors of £'{5'}. 

Figure shows the trajectory of the eigenvectors of the optimal input covariance Q = UQU as k varies between and 1 
for Mo = (i i) and E — (g i)- The points are plotted by writing the columns of U as two points in K^. The vertical axis 
shows the value of k. On the plane k = 0, the channel is zero-mean, correlated Gaussian H ~ A/2,2 (0, S). It can be seen 
that the power allocation is divided between the eigenvectors of the covariance matrix S. Similarly, on the plane k = 1, the 
channel is deterministic, with H ~ Mo- The optimal strategy in this case is beamforming. At each end of the plot, the singular 
vectors of Mq and E have been superimposed, for comparison with Q. 
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Fig. 10. Converge of Algorithm HI for various covaria nce matrices T = rl + diag{r — 1} <26l an d rando m rank-one mean, Mo. Each plot is averaged 
over several independent choices of Mo- Figure p'0(a)| shows convergence for 5x5 matrices. Figure [r()(b)| shows convergence for 15 X 15 matrices and 
Figure [TO(c)|shows convergence for 25 X 25 matrices 




Fig. 11. Variation in eigenvectors of optimal 2x2 covariance matrix Q, with H = kXY?-!'^ + (1 — k)Mo. Eigenvectors of Q are shown dashed. The 
eigenvectors of S are superimposed on the plane k = and the eigenvectors of Mo are superimposed on the plane k = 1. The eigenvectors of E{H'f H} 
are given as solid lines, superimposed over the dashed lines corresponding to Q. 
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Fig. 12. Power allocation for optimal input covariance, qi and q2 with Q = UQU^ 




D. Asymptotics 

It is interesting to consider the low- and high-SNR asymptotics of the MIMO channel capacity. This has been done by many 
authors. Here we give a brief analysis, and in the spirit of the main result presented above, emphasize the results which hold 
for any pn- 

Consider the matrix channel Q and define S = HQH^. By Taylor series expansion, may be approximated near 7 = 

by 

4'(Q)«y(~l)"-i2_E{tr(5")}. (43) 



where S = H'^H. Of particular interest is the first order approximation, 5'(Q) « 7tr (QEjiJ^iyj^. 

Theorem 10 (Low SNR): Consider a matrix channel Q, with E|i/i/^} = UAU'^, with U unitary and A diagonal with 
A = diag{Ai, . . . , A*} and Ai = • • • = > Afc+i • • • > At > 0. For low SNR, 7A1 ^ 1 the capacity achieving distribution is 
Q = UQU^ where Q is diagonal and 




k terms 

and C — jkXi. 

At low SNR the transmitter only needs to know E{i/_ff^}, regardless of the underlying pn- To first order, beamforming in 
the direction of the largest eigenvector of E{i7i/^} is optimal (assuming a unique largest eigenvalue). This aligns with well 
known results [14,40]. 

This result must be taken with care: the approximation is for 7A1 ^ 1 so that large channel gains will necessitate a 
correspondingly smaller value of 7 before the expansion is accurate, see for example [14,40]. 

For Ricean channels with separable correlation, a closed form result may be obtained. Suppose H ~ A/t,r {M, R^T), 
where none of M, R or T are assumed to be diagonal, or jointly diagonalizable. From [44, pp. 251], 5* — HH^ is a quadratic 
normal form and 

E{HH'<} = Ttr(i?) + Af^M. (44) 

thus 

C(7)l^^o=7Ai (45) 

where Ai is the largest eigenvalue of rtr(i?) + Af^Af. This makes it clear that the most fortuitous arrangement of T and M 
is when they share a common largest eigenvector, for R = I and r ^ t, ( I44t is essentially the central Wishart approximation 
of Lemma |2l This is not coincidence, since the central Wishart approximation is found by matching the first moment of the 
density. 

There are several special cases that result in simpler forms for Ai. 

1) In the case of identity transmit covariance T = It, \i = tr{R) + Ai(AftM). 

2) M = al. Then Ai = + tr(i?)Ai(T). 

3) Weak LOS component, Ttr(i?) >> MHI. Then Ai = tr(i?)Ai(r) + e, where |e| < Ai(AftM). Obviously if M = 0, 
e = 0. 

4) Strong LOS component, AftAf >> TtrR. Then Ai Xi{MU''I) + S, where \S\ < tr(i?)Ai(T). 
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5) For r = t = 2 it is easy to obtain a closed form solution for Ai. 

Turning now to the other extreme, for large z, log(l + z) ^ \og{z), and hence at high SNR, 

^tlog7 + logdet(5 + logdet(iJ'''H). 



(46) 



Care must be taken in the definition of "high" SNR. The approximation ( I46> is only valid when ^Qu ^ Xmin, ie. the high 
SNR, is based on high received SNR over all modes, not necessarily high transmit power 

Theorem 11 (High SNR): Consider a matrix channel Q with H a random variable, independent of Q. Then the capacity 
achieving distribution is Q = It/t and the resulting capacity is 



for any probability density function pn, provided that H is independent of Q. 

Theorem ^2 holds regardless of the characteristics of the channel. The optimal transmit strategy at high SNR is equal power, 
independent white signals. This is not surprising when it is seen that for large received power, the variation in channel strength 
is meaningless. From a water-filling perspective, we have a very deep pool, with tiny pebbles on the bottom: allocation of 
power is irrelevant. The channel distribution pn has no effect on the optimal transmission strategy, and only affects the resulting 
capacity via the E{logdet(_ffi/^)} term. This is investigated in much more depth in [56,57]. 

Note also that at high SNR, t \og{P/t) is asymptotic to the capacity resulting from transmitting independent data across t 
non-interfering AWGN channels (each channel getting P/t of the available power). The remaining term is either a capacity 
loss or gain over this parallel channel scenario, depending on the statistics of the channel. In the case of Wishart matrices, 
H ~ N't,r (0, i? (Xi /) ( I47> has a known closed-form solution [23]. For numerical purposes, E{logdet(-ff-fft)} may be obtained 
by Monte-Carlo methods. 



This paper has shown how to correctly compute the capacity of multiple-input multiple-output channels whose gain matrices 
are chosen independently each symbol interval according to a given matrix density. The optimal input density is Gaussian but 
is not identically distributed over time or space except in special cases. 

In the case of full CSI at the transmitter, the optimal power allocation corresponds to water-pouring in space and time, and is 
performed instantaneously, which is an important practical consideration. At each symbol, the transmitter still performs water 
pouring over the channel eigenvalues at that instant, but uses a water level that results in the long-term average power constraint 
being satisfied. In certain circumstances, this yields a considerable gain in rate, compared to a symbol-wise water-filling, in 
which the transmitter uses a water level that enforces a per-symbol power constraint. The peak-to-average power ratios and 
entire power distribution resulting from the use of the optimal space-time water-filling strategy were also considered. For 
Rayleigh channels, the resulting peak-to-average power ratio can be several decibels, depending upon the average power. 

We have investigated the capacity achieving input covariance in the case where the transmitter has statistical CSI. We have 
presented a method for calculating the optimal input covariance for arbitrary Gaussian vector channels. We have provided an 
iterative algorithm which converges to the optimal input covariance, by considering the covariance in terms of a Cholesky 
factorization. We have demonstrated the algorithm on several difficult channels, where the appropriate "diagonal" Q input 
cannot be readily found by inspection. Although the diagonalizing decomposition Q = U QU^ always exists, we have shown 
that the matrix U may be non-trivially related to the pdf of the channel. 

For special cases, the optimal input covariance can be a-priori diagonalized by inspection - such as for zero-mean Kronecker 
correlated Rayleigh channels. In such cases we gave a simpler fixed point equation that characterizes the optimal transmit 
covariance. This particular characterization reveals a close link between the optimality condition for deterministic channels 
(water filling) and that for ergodic channels. 

Appendix 
Proofs 

Proof: [Proof: Theorem Q The capacity is given by 




(47) 



IV. Conclusion 



C = lim sup —I{xN;yN I ^^jv)- 
For fixed N re-write the entire sequence of transmissions Q as 



(48) 



UN = HnXn + ZN- 



(49) 
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For any fixed value of N, the optimal density on xn is obtained by water-filling on the Nm eigenvalues 1^1,^2, ■■■ , VNm of 
Wn = HnHn^ . Thus the optimized information rate for given N is given parametrically by 

CN^j^ (50) 

p-j^ E e-r^ (51) 

Now for a block diagonal matrix such as Wn, the Nm eigenvalues are simply the set of all the eigenvalues of the component 
diagonal blocks, in this case the HmH^k]. As N ^ 00, the distribution of the eigenvalues of Wn converges to the eigenvalue 
density pa associated with pn and the summations become expectations with respect to a randomly chosen eigenvalue of HW. 

■ 

Proof: [Proof: Theorem|2l A few observations can be made regarding the distribution of power resulting from the optimal 
transmit strategy. Firstly, transmit power is upper-bounded by m£_, since the instantaneous power level on each eigenvector is 
^ — 1/Ai, and > 0. The peak-to-average power ratio (PAPR) is therefore Now from il It . 



= t-ElX-']. 

The inequality is due to the fact that the portion of the integral from to 1/^ is non-positive. Therefore ^ is upper-bounded 



Proof: [Proof: Theorem [S] An optimal Q has eigenvalues with satisfy ilOi . and hence 
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where A — > (since S > and Q > are both Hermitian). Now det{A)A^^ — adj(A) and the diagonal elements of adj(A) 
are determinants of principal minors of A > 0, which are non negative [55, p. 398]. Noting that dv/d^ > completes the 
proof. ■ 
Proof: [Proof: Theorem|6l Rank-one transmission with Q = En is optimal if reduction in qi (and corresponding increase 
in some other qi results in an overall decrease in mutual information. From the Kuhn-Tucker conditions il6\ . ( I17> . the condition 
for optimality is (see also [12,21,22,31,49]) 



dqi 

Furthermore, we can restrict attention to fc = 2 in i52\ 
Now 



> 



Q=Ei 



dqk 



k>2. 



(52) 



A*(g)^E{((I + 5Q)-5)^J 



where 5 = jT^/^X^ RXT^/^ with X ~ Aft^r (0,/). 
Now A^ I + SEii is of the form 



1 + 5*11 Oto_i 
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where Om_i is an all-zero row vector of length m — 1 and 6 is a column vector of length m — 1. We need to find the inner 
product between row fc > 2 of A^^ and the corresponding column k of S. Applying the partitioned matrix inverse theorem 
yields 

Om-l 



= i+J, 



b_ T 

1+Sii 



and hence for fc > 1, 



5^- 

dqk 



= E{5fcfc} - E 

Q=-Eii 



= irTk - E <" 



1 + ^1 



1 + 7Ti EI=i P« 1^*1 



2 



since (a) S = S\ (b) E{S} = 7tr(i?)r, and (c), 

r 

S'lA; = 7\/''"lTfc Pi X*iXik. 

Similarly, for k — 1 

= E< 



5* 



= E 



5ii 
1 + ^11 



l+7nE[=iPdXaP 

Finally, the expectation with respect to the Xik may be taken, which completes the proof (using the fact that the Xik are 
independent of the Xn). ■ 
Proof: [Proof: Theorem^l We need to compute the expectation (I27t where W — X^RX, with X ~ Afr,2 (0,/)- To that 
end, let u ^ JVr,i (0, /) and v ^ Mr,i (0, /) be independent Gaussian random vectors. Then Wu ^ u^Ru and W12 ~ u^Rv. 
Noting that e^^^dx — 1/z, (which was also a key step for [48]), 

/ e^'-^E{exp {-x'^Tiu^ Ru) {u^ Ru + jTi\u^ Rvf)} dx 
Jo 

= / e~''Eu{exp{-x-fTiu''Ru){u^Ru + jTiEy{\u^Rvf})}dx 
Ja 

/>oo 

= / e~^E„{exp (— X7riu^i?u) + 7riu^i?'^u)} c?x 

Jo 

since u and v are independent. Now define a; = jTipi, let = (with density e~™0- Writing out the inner products as 
summations and using the properties of the exponential, 

= / e--^(p, + 7Tip?)E{i«,e--'^''"-} J]E{e-™^'"^ }dx 
where the last line is due to the independence of the Wi. Computing the expectations results in 



r poo 
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via partial fraction expansion of the product. Exchanging the order of integration and summation and noting 

Jo {1 + a,x){l + ajx) "'^ 

as defined in the statement of the theorem completes the proof (with a few algebraic re-arrangements). ■ 
Proof: [Proof: Lemma 13 We only consider entries in the upper-triangular (non-zero) part of F, c?i<j. We need Q = F^F 
with ti{Q) — J2i<ji'^ij)'^ — 1 ^^'^ '^he diagonal elements of F > 0. We will minimize the negative of /{v) Minimize ^f{v) 
subject to 

i<j 

gi = -Vii < 



Create a modified cost function J{iy, /i, </)) to be minimized, given by 

t 



\i<j J j=i 

We wish to find min^ J [/(i^), /i^]. The first step is to find the conditions for the optimal point iy° to be a minimum 
From [52, 53, 58] must satisfy 

1) J [/(^))M;</'] is stationary at the optimal point 

2) Hiki{i''^) = for every constraint fci(i^) 

3) ^l^>0 yi. 

4) If /ii 7^ then constraint fci(i^) = 
From item 1, 



where Sij is the Kronecker Delta, Sij — 1 for i = j. Rearranging J53> gives: 



= 2^vu, vu > 0,/i > 
< v,, = Q 



(53) 



(54) 

(55) 
(56) 



Proof: [Proof. Theorem|8l For a channel Q where H is defined by an arbitrary pdf, and the receiver has full knowledge of 
H, whilst the transmitter has statistical knowledge, the input distribution is known to be Gaussian with certain co variance [59]. 
Thus it remains to find the optimal covariance Q"^' of the Gaussian input signal. 

Before applying Lemma |3] we must show that logdet(/ + MX'^XM'^) is convex n on any positive definite matrix X 
- which implies [X^^ X , p{S)^ is convex n on any positive triangular matrix as we require. Applying a variation of [55, 
pp.466-467]. 

logdet {l + M{aA + (1 - a)B)\aA + (1 - a)B)M^^ 

> logdet (/ + a^MA^AAp + (1 - afMB^BM^) 

= logdet (a/ + a^MA^'AM^ + (1 - a)I + (1 - afMB^BM^) 

> a logdet (I + aMA''AM^) + (1 - a) logdet (/+ (1 - a)MB^BM^) 

The result of Theorem|8]is given by applying Lemma|3]to the (convex n) function /(d) — "^{Q — FtF,p(S')). The matrix 
Q may now be full, but remains positive semi-definite. Substituting X{d) = F^F 



d^{T^T,p(S)) 



ddij 



EcUr 



91ogdet(/ + S'X) dX 



dX 



{i + sx)-^s 



ddi. 



dX 



ddi. 



EsUr 



{i + sv^v)-^s- 



9FtF 



dd., 
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Since and S are independent the trace, expectation and differentiation all commute, and the second line arises from 
application of the matrix chain rule. Observe that df{X{t))/dt — tj:{df{X)/dx ■ dX/dt). Define E^^ as the matrix of partial 
derivatives of F^F with respect to dij. In general this matrix is full. 



ddij ddij 



The channel capacity is also known to be the expectation of = F^F, = H) over S, with Gaussian input [59]. 

■ 

Proof: [Proof: Theorem |9l The algorithm is a gradient descent algorithm on a convex problem. ■ 
Proof: [Proof: Theorem 1 101 The optimization may may be written as 

t 

C= max VE^^{log(l + 7aO} (57) 

tr(Q) = l^ 

where ai is the i*^ largest singular value of S* = HQW . Taylor expansion of iSli . around 7 = gives: 

t 

C= max y^EH{iai}= max h {ti {HQ H'')} 

ti(Q) = l^ tr(Q) = l 

It now remains to find the capacity achieving distribution. Note, for any Hermitian matrices A and B with eigenvalues 
Q-i > • • ■ > In and bi > • ■ • > bn, 

tr(AS) < aA 

i 

with equality if A and B are jointly diagonalizable [51]'. With A = Q and B = EjiJiJ^j the capacity achieving distribution 
diagonalizes EjiJiJ^}. Apply Definition [2 to give 



~ — — 

dQii 



Since we require /i constant for all non-zero Qu, the only valid solution is 

^""{0 else 

for distinct A^, which gives and substituting for (I57> gives the desired result. 

For k equal eigenvalues the unique solution becomes /i = 1/k, which gives the desired result. ■ 
Proof: [Proof: Theorem II II Starting from the definition of high-SNR, note that I{Q, 7) is dependent on Q only through 
the eigenvalues of Q, and not through any interaction with H. Using a Lagrange-multiplier method, and differentiating (I46> 
with respect to Qa, gives: 

— = ^ Qii>0 



with the only solution. 
Substituting in (|46j gives WT\ . 



fj, t 
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